[python] support read after data evolution updating by shard by XiaoHongbo-Hope · Pull Request #7157 · apache/paimon

XiaoHongbo-Hope · 2026-01-30T08:13:57Z

Problem

When the user updates a column for only one shard (e.g. ShardTableUpdator runs shard 0 only and writes new column d), full table read fails:

pyarrow.lib.ArrowInvalid: Schema at index 1 was different: d: int32 vs d: null

Only that shard’s files have the new column; other files do not. Concat batches → schema mismatch → crash. To fix the issue, we support data evolution shard read.

Tests

API and Format

Documentation

JingsongLi · 2026-02-02T02:47:54Z

paimon-python/pypaimon/read/reader/data_file_batch_reader.py

                 row_tracking_enabled: bool,
-                 system_fields: dict):
+                 system_fields: dict,
+                 requested_field_names: Optional[List[str]] = None):


You should just use fields: List[DataField]?

You should just use fields: List[DataField]?

Updated

JingsongLi · 2026-02-02T02:55:25Z

paimon-python/pypaimon/read/table_read.py

+        """Ensure _ROW_ID and _SEQUENCE_NUMBER are not null (per SpecialFields)."""
+        fields = []
+        for field in schema:
+            if field.name == SpecialFields.ROW_ID.name or field.name == SpecialFields.SEQUENCE_NUMBER.name:


Why it can be nullable?

Why it can be nullable?

A bug here. Nullable info of row-tracking system fields is lost during _assign_row_tracking. Opened a separate PR #7174 to fix it.

JingsongLi · 2026-02-28T01:54:26Z

docs/content/pypaimon/data-evolution.md

  same order for that shard.
 - **Parallelism**: run multiple shards by calling `new_shard_updator(shard_idx, num_shards)` for each shard.
+
+## Read After Partial Shard Update


I feel like this document doesn't make much sense

I feel like this document doesn't make much sense

Removed

JingsongLi · 2026-02-28T01:55:46Z

paimon-python/pypaimon/read/reader/concat_batch_reader.py

+                    ).slice(0, min_rows)
+                    columns.append(column)
+                else:
+                    columns.append(pa.nulls(min_rows, type=self.schema.field(i).type))


This work should be done in DataFileBatchReader?

JingsongLi · 2026-02-28T02:01:54Z

paimon-python/pypaimon/read/reader/data_file_batch_reader.py

+                    else:
+                        field = self.schema_map.get(name)
+                        inter_arrays.append(
+                            pa.nulls(num_rows, type=field.type) if field is not None else pa.nulls(num_rows)


I don't get it, FormatPyArrowReader have already handled read_fields.

JingsongLi

I ran your test and tried to fix it. All I need to do is modify FormatPyArrowReader out_fields.append(pa.field(field_name, pa.null(), nullable=True))， Do not pass null type, pass the correct type to fix it.

XiaoHongbo-Hope · 2026-02-28T15:31:25Z

I ran your test and tried to fix it. All I need to do is modify FormatPyArrowReader out_fields.append(pa.field(field_name, pa.null(), nullable=True))， Do not pass null type, pass the correct type to fix it.

My bad, fixed by updating read_fields into List[DataField]

JingsongLi

Thanks @XiaoHongbo-Hope ! Looks good to me.

XiaoHongbo-Hope marked this pull request as ready for review January 30, 2026 08:19

XiaoHongbo-Hope changed the title ~~[python/hotfix] fix data-evolution read after partial shard update~~ [python] support data evolution shard read Jan 31, 2026

XiaoHongbo-Hope marked this pull request as draft January 31, 2026 07:48

XiaoHongbo-Hope marked this pull request as ready for review January 31, 2026 08:54

XiaoHongbo-Hope changed the title ~~[python] support data evolution shard read~~ [python] support read after update by shard of data evolution table Feb 1, 2026

JingsongLi reviewed Feb 2, 2026

View reviewed changes

XiaoHongbo-Hope force-pushed the shards_read branch 2 times, most recently from 72ffd99 to 7e5f55a Compare February 2, 2026 11:33

XiaoHongbo-Hope marked this pull request as draft February 3, 2026 03:51

XiaoHongbo-Hope marked this pull request as ready for review February 3, 2026 08:02

XiaoHongbo-Hope marked this pull request as draft February 3, 2026 08:46

XiaoHongbo-Hope changed the title ~~[python] support read after update by shard of data evolution table~~ [python] support read after data evolution shard updating Feb 3, 2026

XiaoHongbo-Hope changed the title ~~[python] support read after data evolution shard updating~~ [python] support read after data evolution updating by shard Feb 3, 2026

XiaoHongbo-Hope force-pushed the shards_read branch 4 times, most recently from c00cafa to cb28d6b Compare February 8, 2026 10:35

XiaoHongbo-Hope force-pushed the shards_read branch 5 times, most recently from 55289ad to 277fef4 Compare February 27, 2026 09:36

support read after data evolution updating by shard

7d57ac4

XiaoHongbo-Hope force-pushed the shards_read branch from 277fef4 to 7d57ac4 Compare February 27, 2026 09:57

xiaohongbo added 2 commits February 27, 2026 18:06

fix merge issue

a7e7030

clean code

93897a9

XiaoHongbo-Hope marked this pull request as ready for review February 27, 2026 10:29

add create_index_mapping back and fix code format

6d5f8c4

JingsongLi requested changes Feb 28, 2026

View reviewed changes

xiaohongbo added 8 commits February 28, 2026 11:17

revert change

5a0d223

support read after data evolution updating by shard

a320c38

fix system field not nullable issue

55b82fa

clean code

f60182b

clean code

c737850

clean code

347b16d

fix index_mapping when blob

36618f9

fix blob_table_test

76d55b8

XiaoHongbo-Hope marked this pull request as draft February 28, 2026 08:26

xiaohongbo added 7 commits February 28, 2026 17:00

fix

61b0e4c

fix

cd695af

fix

5d60203

revert change in concat_batch_reader.py

9f0ab3b

refactor

34fcc38

revert

47c293a

[python] Fix FormatPyArrowReader missing-column type issue

8da40bd

XiaoHongbo-Hope marked this pull request as ready for review February 28, 2026 15:28

XiaoHongbo-Hope marked this pull request as draft February 28, 2026 15:44

refactor to merge read_fields and read_schema_fields

c2a204e

XiaoHongbo-Hope marked this pull request as ready for review February 28, 2026 16:08

JingsongLi approved these changes Mar 1, 2026

View reviewed changes

JingsongLi merged commit ae5635a into apache:master Mar 1, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python] support read after data evolution updating by shard#7157

[python] support read after data evolution updating by shard#7157
JingsongLi merged 20 commits intoapache:masterfrom
XiaoHongbo-Hope:shards_read

XiaoHongbo-Hope commented Jan 30, 2026 •

edited

Loading

Uh oh!

JingsongLi Feb 2, 2026

Uh oh!

XiaoHongbo-Hope Feb 3, 2026

Uh oh!

JingsongLi Feb 2, 2026

Uh oh!

XiaoHongbo-Hope Feb 2, 2026 •

edited

Loading

Uh oh!

JingsongLi Feb 28, 2026

Uh oh!

XiaoHongbo-Hope Feb 28, 2026

Uh oh!

JingsongLi Feb 28, 2026

Uh oh!

JingsongLi Feb 28, 2026

Uh oh!

JingsongLi left a comment

Uh oh!

XiaoHongbo-Hope commented Feb 28, 2026 •

edited

Loading

Uh oh!

JingsongLi left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

XiaoHongbo-Hope commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Tests

API and Format

Documentation

Uh oh!

JingsongLi Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

XiaoHongbo-Hope Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

JingsongLi Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

XiaoHongbo-Hope Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JingsongLi Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

XiaoHongbo-Hope Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

JingsongLi Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

JingsongLi Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

JingsongLi left a comment

Choose a reason for hiding this comment

Uh oh!

XiaoHongbo-Hope commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JingsongLi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

XiaoHongbo-Hope commented Jan 30, 2026 •

edited

Loading

XiaoHongbo-Hope Feb 2, 2026 •

edited

Loading

XiaoHongbo-Hope commented Feb 28, 2026 •

edited

Loading